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Phaeobacter gallaeciensis CIP 105210^ (= DSM 26640^ = BS107^) is the type strain of the 
species Phaeobacter gallaeciensis. The genus Phaeobacter belongs to the marine Roseobacter 
group (Rhodobacteraceae, Alphaproteobacteria). Phaeobacter species are effective colonizers 
of marine surfaces, including frequent associations with eukaryotes. Strain BS107^ was isolat- 
ed from a rearing of the scallop Pecten maximus. Here we describe the features of this organ- 
ism, together with the complete genome sequence, comprising eight circular replicons with a 
total of 4,448 genes. In addition to a high number of extrachromosomal replicons, the ge- 
nome contains six genomic island and three putative prophage regions, as well as a hybrid 
between a plasmid and a circular phage. Phylogenomic analyses confirm previous results, 
which indicated that the originally reported P. gallaeciensis type-strain deposit DSM 17395 
belongs to P. inhibens and that CIP 105210^ (= DSM 26640^) is the sole genome-sequenced 
representative of P. gallaeciensis. 



Introduction 

Strain CIP 105210^ (= BS107T = DSM 26640^) is 
the type strain of Phaeobacter gallaeciensis, the 
type species of Phaeobacter, a genus of marine 
species of Rhodobacteraceae [Rhodobacterales, 
Alphaproteobacteria). 65107^ was isolated from 
the scallop Pecten maximus and was initially de- 
scribed as the type strain of Roseobacter 
gallaeciensis [1]. After comprehensive reclassifica- 
tions of Rhodobacteraceae genera, 65107^ became 
the type strain of the species P. gallaeciensis [2], 
currently comprising the species P. gallaeciensis, 
P. inhibens, P. caeruleus, P. daeponensis, P. leonis 
and P. arcticus. A recent study [3] revealed the 
non-identity of the reported identical deposits 
DSM 17395 and CIP 105210^ and confirmed that 
the strain CIP 105210^ represents the original P. 
gallaeciensis isolate 65107^ which is now depos- 
ited in the DSMZ open collection as DSM 26640^ 
In contrast, strain DSM 17395 was reclassified as a 
representative of the sister species P. inhibens. 
Analysis of their similar, but distinct metabolic ca- 
pacities allowed for a discrimination between the 
two strains, which were originally reported to 




represent the same type strain [3]. Thus, in the 
absence of sequenced genomes, the assignment to 
species was essentially based on deviating plas- 
mid profiles and molecular analyses [16S rDNA, 
ITS, DNA-DNA hybridization), which showed con- 
vergent results. 

The genus Phaeobacer comprises effective surface 
colonizers. Comparative analyses of strains DSM 
17395 and DSM 24588 (= 2.10) revealed a high 
level of adaptation to life on surfaces [4]. The pro- 
duction of the characteristic antibiotic 
tropodithietic acid [TDA) correlates with the for- 
mation of a brown pigment that is eponymous for 
Phaeobacter [1]. Current scientific interest in 
Phaeobacter is based on the role of its strains as 
probiotic agents in fish aquaculture [5] and as 
agents of bleaching diseases in marine red algae 
[6], as well as on their potential regulatory activity 
during phytoplankton blooms [7] via so-called 
roseobacticides [8]. Here we present the complete 
genome sequence of P. gallaeciensis CIP 105210^ 
together with a summary classification and a set 
of features, including insights into genome archi- 
tecture, genomic islands and phages. 
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Classification and features 
1 6S rRNA gene analysis 

Figure 1 shows the phylogenetic neighborhood of 
P. gallaeciensis CIP 105210^ in a 16S rDNA gene 
sequence based tree. The sequences of the four 
16S rRNA identical gene copies in the genome dif- 
fer by five nucleotides from the previously pub- 
lished 16S rDNA gene sequence (Y13244 [1]). 

A representative genomic 16S rDNA gene se- 
quence of P. gallaeciensis CIP 1052 10^ was com- 
pared with the Greengenes database for determin- 
ing the weighted relative frequencies of taxa and 
[truncated) keywords as previously described [9], 
to infer the taxonomic and environmental affilia- 
tion of the strain. The most frequently occurring 
genera were Ruegeha [30.2%), Phaeobacter 
[29.4%), Roseobacter [13.9%), Silicibacter 
[13.7%) and Nautella [3.6%) [698 hits in total). 
Regarding the 30 hits to sequences from members 
of the species, the average identity within HSPs 
[high-scoring segment pairs) was 99.6%, whereas 



the average coverage by HSPs was 18.7%. Regard- 
ing the 20 hits to sequences from other members 
of the genus, the average identity within HSPs was 
98.0%, whereas the average coverage by HSPs 
was 18.7%. Among all other species, the one yield- 
ing the highest score was P. inhibens [AY177712), 
which corresponded to a 16S rDNA gene identity 
of 99.5% and an HSP coverage of 18.6%. [Note 
that the Greengenes database uses the INSDC [= 
EMBL/NCBI/DDBJ) annotation, which is not an 
authoritative source for nomenclature or classifi- 
cation.) The highest-scoring environmental se- 
quence was AJ296158 [Greengenes short name 
'Spain:Galicia isolate str. PP-154'), which showed 
an identity of 99.8% and an HSP coverage of 
18.7%. The most frequently occurring keywords 
within the labels of all environmental samples 
which yielded hits were 'microbi' [2.8%), 'marin' 
[2.5%), 'coral' [2.4%), 'sediment' [2.0%) and 
'biofilm' [1.9%) [509 hits in total). Environmental 
samples which yielded hits of a higher score than 
the highest scoring species were not found. 
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Figure 1. Phylogenetic tree highlighting the position of P. gallaeciensis relative to the type strains of the other 
species within the genus Phaeobacter and the neighboring genus Leisingera. The tree was inferred from 1,381 
aligned characters of the 1 65 rRNA gene sequence under the maximum likelihood (ML) criterion as previous- 
ly described [9]. Ruegeria spp. were included in the dataset for use as outgroup taxa. The branches are scaled 
in terms of the expected number of substitutions per site. Numbers adjacent to the branches are support val- 
ues from 1,000 ML bootstrap replicates (left) and from 1,000 maximum-parsimony bootstrap replicates (right) 
if larger than 60% [9]. Lineages with type strain genome sequencing projects registered in GOLD [10] are la- 
beled with one asterisk, those also listed as 'Complete and Published' with two asterisks. Genome sequences 
are available for P. arcticus (DQ514304) [11], P. inhibens (AY177712) [12], P. caeruieus (AM943630) [13], 
P. daeponensis (DQ981416) [14], P. gallaeciensis (IMG2545691 71 1 , this publication), L aquimarina 
(AM900415) [15], L methylohalidivorans (AY005463) [16] and R. pomeroy/ (AF098491 ) [17]. 
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Morphology and physiology 

Cells of BS 107^ stain Gram-negative and are 
ovoid-shaped rods ranging 0.7-1.0 |j.m in width 
and 1.7-2.5 |im in length. Motility is achieved by 
means of a polar flagellum (not visible in Figure 
2). Young colonies grown on Marine Broth (MB) at 
23°C are 0.5 mm in diameter, circular, smooth, 
convex and brownish with regular edges [1]. Col- 
onies incubated for 7 days are 2 mm in diameter 
with irregular edges and produce a brown, diffus- 
ible pigment. Cells grow at temperatures between 
15 and 37°C; optimal growth was observed in a 
range between 23 and 27°C. The optimal pH is 7.0, 
with growth occurring up to pH 10.0 but none be- 
low pH 4.0. Cells grow at salt concentrations rang- 
ing from 0.1 to 2.0 M NaCl, with 0.2 M being the 
optimal concentration. Additional thiamine (vita- 
min B2] is required for growth in minimal medi- 
um. Cells exhibit catalase and oxidase activity, but 
they do not exhibit amylase, gelatinase, ft- 
galactosidase, tweenase, DNase, urease, arginine 
dihydrolase, lysine decarboxylase and ornithine 
decarboxylase activities [1]. 

BSIO7T is able to use the following substrates as 
sole carbon source and energy source: D- man- 



nose, D-galactose, D-fructose, D-glucose, D-xylose, 
melibiose, trehalose, maltose, cellobiose, sucrose, 
meso-erythritol, D-mannitol, glycerol, D-sorbitol, 
meso-inositol, succinate, propionate, butyrate, y- 
aminobutyrate, DL-hydroxybutyrate, 2- 
ketoglutarate, pyruvate, fumarate, glycine, L-a-al- 
anine, p-alanine, L-glutamate, L-lysine, L-arginine, 
L-ornithine, L-proline, acetate and leucine. 
Bacteriochlorophyll a was not detected [1]. 

The metabolic properties of Phaeobacter gallaeci- 
ensis CIP 105210T and the P. inhibens strains DSM 
17395, DSM 24588 (= 2.10) and DSM 16374^ (= 
T5T) were compared using the more sensitive 
Phenotype MicroArray (PM) technology [3]. Here, 
using the statistical analysis (clustering and dis- 
cretization) approaches as implemented in "opm" 
[18,19], the non-identity of strains CIP 105210^ 
and DSM 17395 could be demonstrated despite an 
overall similar physiology. Differences could be 
found regarding the respiration of tyramine, 
which was positive in DSM 17395 and negative in 
CIP 105210^, and for butyrate, for which respira- 
tion was found to be negative in DSM 17395 and 
positive in CIP 105210^ [3]. A summary of the 
classification and features of CIP 1052 10^ is pre- 
sented in Table 1. 




Figure 2. Scanning electron micrograph of P. gallaeciensis CIP 105210T. 
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Chemotaxonomy 

The chemical composition of strain BS107T con- 
firmed ubiquinones as the sole respiratory 
lipoquinones and revealed QIO as predominant. 
Polar lipids consisted of an unidentified phospho- 
lipid, two uncharacterized lipids, aminolipids, 
phosphatidylenthanolamine, 



phosphatidylglycerole and phosphatidylcholine 
[2]. 

The major fatty acids are the monounsaturated ac- 
ids Ci8:ia)7c (76.1%), aud 11-methyl Cw.n^ic (6.1%), 
followed by hydroxy fatty acid Ci6:o 2-oh (5.1%) as 
well as Ci6:o (4.0%), Cu-.i (3.1%), Ci8:o (2.6%), Cio:o 

3-OH (2.2%) and Cl8:la39c (0.9%) [2]. 



Table 1. Classification and general features of P. gallaeciensis BS1 07^ according to the MIGS recommenda- 

tions [20] published by the Genome Standards Consortium [21]. 

MIGS ID Property Term Evidence code^ 



MIGS-12 



MIGS-22 



MIGS-6 

MIGS-6.2 

MIGS-15 

MIGS-14 

MIGS-16 

MIGS-18 

MIGS-19 
MIGS- 
23.1 
MIGS-4 

MIGS-5 

MIGS-4.1 

MIGS-4.2 

MIGS-4.3 

MIGS-4.4 



Current classification 



Reference for biomaterial 

Gram stain 

Cell shape 

Motility 

Sporulation 

Temperature range 

Optimum temperature 

Salinity 

Relationship to oxygen 
Carbon source 

Energy metabolism 

Habitat 

pH 

Biotic relationship 
Known pathogenicity 
Specific host 
Health status of host 
Biosafety level 
Trophic level 
Isolation 

Geographic location 

Time of sample collection 

Latitude 

Longitude 

Depth 

Altitude 



Domain Bacteria J AS 

Phylum Proteobacteria TAS 

Class Alphaproteobacteria TAS 

Order Rhodobacterales TAS 

Family Rhodobacteraceae TAS 

Genus Phaeobacter TAS 

Species Phaeobacter gallaeciensis TAS 
Subspecific genetic lineage (strain) BS107^ TAS 

Ruiz-Ponte et al. 1 998 TAS 

Gram-negative TAS 

ovoid-rod-shaped TAS 

motile, via polar flagella TAS 
not reported 

15-37°C, mesophile TAS 

23-27°C TAS 

0.1-2.0 MNaCI TAS 

aerobe TAS 

complex substrates, butyrate, DL- TAS 
hydroxybutyrate, D-xylose 

chemoheterotrophic TAS 

seaw^ater, Pecten maximus TAS 

4.0-1 0.0, optimum 7.0 TAS 

free living, facultative symbiont TAS 

IDA 

Pecten maximus 
not reported 

1 TAS [29] 

heterotroph TAS [1] 

seawater of larval cultures of the scallop TAS [1] 
Pecten maximus 

A Coruna, Galicia, Spain TAS [1] 

not reported 

43.3619 

-8.410 

not reported 

about sea level 



22] 
23] 

24,25] 
25,26] 
25,27] 
1,28] 
1] 
1] 
1] 
1] 
1] 
1] 

1] 
1] 
1] 
1] 
1] 

1] 
1] 
1] 
1] 



""Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report 
exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, iso- 
lated sample, but based on a generally accepted property for the species, or anecdotal evidence). Evi- 
dence codes are from of the Gene Ontology project [30]. 
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Genome sequencing and annotation 



Table 2. Genome sequencing project information 


MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Finished 


MIGS-28 


Libraries used 


One draft assembly of standard shotgun library, one 3 kbp paired- 






end library 


Miub-zy 


Sequencing platforms 


Kocne/434 Ub rLA 1 Itanium 


/VllLib- 


Sequencing coverage 


zz X 


31.2 






MIGS-30 


Assemblers 


Newbler assembler version 2.6 (Software Release: 2.6 






[Zu I I UD 1 /_ 1 DUZ) 


MIGS-32 


Gene calling method 


Prodigal 1 .4 




INSDC ID 


CP002976.1 




Gen Bank Date of Release 


1/31/2014 




GOLD ID 


Gi24053 




NCBI project ID 


188096 




Database: IMG 


2531839720 


MIGS-13 


Source material identifier 


CIP 105210^ 




Project relevance 


Tree of Life, carbon cycle, scallop rearing, plasmid 



Growth conditions and DNA extraction 

A culture of CIP 105210^ was grown aerobically in 
100 ml of DSMZ medium 514 [31] on a shaker at 
28°C. Genomic DNA was isolated using the Qiagen 
Genomic DNA Kit, following the standard protocol 
for Bacteria 500G provided by the manufacturer. 
The extracted DNA had a concentration of 200 
ng/|il. The quality of the DNA was checked with 
the NanoDrop. 

Genome sequencing and assembly 

The genome of P. gallaeciensis CIP 10521 0^ was 
sequenced using the Roche/454 GS FLX Titanium 
sequencing platform [Table 2]. A draft assembly 
based on 247,768 reads of a standard shotgun li- 
brary and 204,863 reads of a 3 kbp paired-end li- 
brary [LGC Genomics, Berlin, Germany) with a to- 
tal of 138 Mb [2 2 -fold coverage) was generated 
with Newbler assembler, Roche Diagnostics 
GmbH, Mannheim, Germany). This assembly con- 
sisted of 45 contigs 26 of which could be joined in- 
to 15 scaffolds. Gaps resulting from repetitive se- 
quences were closed by PGR followed by Sanger 
sequencing, yielding a final genome size of 
4,540,155 bp, that consists of one circular chro- 



mosome of 3,776,653 bp and seven circular plas- 
mids. 

Genome annotation 

Genes were identified using Prodigal [32] as part 
of the Integrated Microbial Genomes Expert Re- 
view (IMG/ER) annotation pipeline [33]. The pre- 
dicted CDSs were translated and used to search 
the National Center for Biotechnology Information 
[NCBI) nonredundant database, UniProt, TIGR- 
Fam, Pfam, PRIAM, KEGG, COG, and InterPro data- 
bases. 

Genome properties 

The Phaeobacter gallaeciensis CIP 1052 10'^ ge- 
nome statistics are provided in Table 3 and Fig- 
ures 3a, 3b, 3c, 3d, 3e, 3f, 3g, 3h. The genome con- 
sists of eight circular replicons with a total length 
of 4,540,155 bp and a G+C content of 59.44%. The 
replicons correspond to a single chromosome 
(3,776,653 bp) and seven extrachromosomal ele- 
ments ranging in size between 255,493 bp and 
40,170 bp. From a total of 4,448 predicted genes, 
4,369 were protein coding genes and 79 RNA 
genes. The distribution of genes into COGs func- 
tional categories is presented in Table 4. 
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Table 3. Genome statistics 



Attribute 


Value 


% of totaP 


Genome size (bp) 


4,540,155 


100.00 


DNA coding region (bp) 


4,056,108 


89.34 


DNA G+C content (bp) 


2,698,552 


59.44 


Number of replicons 


8 




Extrachromosomal elements 


7 




Total genes 


4,448 


100.00 


RNA genes 


79 


1.78 


rRNA operons 


4 




tRNA genes 


59 


1.33 


Protein-coding genes 


4,369 


98.22 


Genes with function prediction 


3,595 


80.82 


Genes in paralog clusters 


3,475 


78.13 


Genes assigned to COGs 


3,422 


76.93 


Genes assigned Pfam domains 


3,657 


82.22 


Genes with signal peptides 


457 


10.27 


Genes with transmembrane helices 


975 


21.92 


CRISPR repeats 


0 





^The total is based on either the size of th 
of protein coding genes in the annotated 

Insights into the genome 
Unique genes 

A search for specific genes in the genome of P. 
gallaeciensis GIF 1052 10^ compared to the P. 
inhibens strains DSM 24588 (= 2.10), DSM 16374^ 
(= T5T) and DSM 17395, based on an e-value of le- 
5 and a minimum identity of 30%, resulted in a to- 
tal number of 551 specific genes. 296 (54%) of 
these genes were located on the chromosome and 
255 (46%) on extrachromosomal repUcons. In 
comparison with the other completely sequenced 
bacterial strains of the genus Phaeobacter, 8% of 
the chromosomal and 35% of the 
extrachromosomal P. gallaeciensis CIP 105210^ 
genes were unique, thus reflecting the considera- 
ble contribution of extrachromosomal elements to 
unique gene content. 

The observed distribution may be influenced by 
the presence of two chromosome-encoded bacte- 
rial MobC mobilization proteins (Gal_00154, 
Gal_01073). MobC, which is missing in all three 
completely sequenced P. inhibens strains, is part of 
the relaxosome at the origin of transfer and in- 



e genome in base pairs or the total number 
genome 

creases the frequency of plasmid mobilization and 
therefore conjugal transfer of plasmids [34], 
which is also in agreement with the comparably 
large number of seven extrachromosomal repli- 
cons present in CIP 105210^ 

The probable function of some of the unique genes 
is explained below. Genes Gal_01405 and 
Gal_01407 constitute methane monooxygenases 
(EC 1.14.13.25) facilitating the degradation of ar- 
omatic compounds and phenols [35]. Gal_01397, a 
monoamine oxidase could provide an additional 
source of ammonium [36]. 

Unique genes are also provided by phage-like el- 
ements. In CIP 105210'r these so-called "morons" 
(because they add "more on" the genome [37]) 
comprise, e.g., an ABC-2 family drug transporter 
(Gal_01752) [38], and a negative regulator of beta- 
lactamase expression (Gal_02239). 

Genomic islands 

Six genomic islands could be identified on the 
chromosome with the web-based island-viewer 
system [39]. Island-viewer combines the methods 
IslandPick [40], which uses a comparative ge- 
nomics approach, SIGI-HMM [41], which rehes 
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upon deviating codon usage signatures, and 
IslandPath-DIMOB [42], which identifies genomic 
islands based on deviating GC content, dinucleo- 
tide bias in gene clusters and the presence of is- 
land specific genes like mobility genes and tRNAs. 

Island-I ranging from position 155,977 to 177,667 
(21,690 bp) contains a tRNA gene (Phe GAA, 
Gal_00137) next to a site-specific recombinase 
XerD [Gal_00138) and the bacterial mobilization 
protein [MobC, Gal_00154; see above). Further- 



more, it contains a transcriptional regulator of the 
LysR family [Gal_00160) and an adjacent ABC- 
type transport system for gly- 
cine/proline/betaine. Island-II [422,441 to 
434,165; 11,725 bp) mainly consists of hypothet- 
ical proteins, but it also contains a large type II re- 
striction enzyme [905aa, Gal_00442) and another 
site specific XerD recombinase [Gal_00444) next 
to a tRNA for proUne (Gal_00445). Island-Ill 
(1,085,143 to 1,096,105; 10,962 bp) 




Figure 3a. Circular graphical map of the chromosome. From margin to center: Genes on forward strand (col- 
ored by COG categories), genes on reverse strand (colored by COG categories), RNA genes (tRNAs green, 
rRNAs red, other RNAs black), GC content, GC skew. 
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Figure 3b. Circular graphical map of the extrachromosomal replicon pGal_A255. From margin to 
center: Genes on forward strand (colored by COG categories), genes on reverse strand (colored by 
COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs black), GC content, GC skew. 



contains three XerD recombinases in row 
(Gal_01065 to Gal_01067, a MobC protein 
[Gal_01073) and the typical VirD2 relaxase 
[Gal_01074) as well as the VirD4 coupling protein 
[Gal_01075) of type IV secretion systems [43] in- 
dicating a plasmid-derived origin of this island. Is- 
land-IV (1,626,663 to 1,641,677; 15,014 bp) con- 
tains an ABC-type cobalt transport system and a 
XerC recombinase [Gal_01616). Island-V 
(2,821,359 to 2,848,860; 27,501 bp) consists 
mainly of regulated TRAP C4-dicarboxylate and 
ABC-type dipeptide/oligopeptide/nickel transport 
proteins and also the epsilon subunit of DNA pol- 



ymerase III (Gal_02817). Island-VI (3,328,870 to 
3,344,910; 16,040 bp) lies adjacent to a ribosomal 
rRNA-operon and contains an ABC-type amino ac- 
id/amide transport system and an El component 
of the pyruvate dehydrogenase complex 
(Gal_03286, E.G.: 1.2.4.1). 

Phage-like elements 

The presence of phage-like elements was analyzed 
with the online tool PHAST [44]. The program 
identified 16 genes representing a gene transfer 
agent (GTA [45];) and three incomplete clusters of 
phage-derived genes with sizes between 15 kb 
and 40 kb (Table 5). 
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Figure 3c-h. Circular graphical map of the extrachromosomal replicons: (c) pGal_B134, (d) pGal_C110, (e) 
pGal_D78, (f) pGal_E78, (g) pGal_F69, and (h) pGal_G40. From margin to center: Genes on forward strand 
(colored by COG categories), genes on reverse strand (colored by COG categories), RNA genes (tRNAs 
green, rRNAs red, other RNAs black), GC content, GC skew. 
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9.44 


Inorganic ion transport and metabolism 


Q 


132 


3.50 


Secondary metabolites biosynthesis, transport and catabolism 


R 


451 


12.00 


General function prediction only 


S 


333 


9.00 


Function unknown 




1,026 


23.07 


Not in COGs 



""The total is based on the total number of protein coding genes in the annotated genome. 
Table 5. Prophage regions in the chromosome of P. gallaeciensis CIP 105210^ '' 



Region 


Length 


Completeness 


Score CDS 


Coordinates 


Specific keyword GC% 


1 


14.2 kb 


Questionable 


80 


16 


1,566,488-1,580,693 


Gene transfer agent (GTA) 64.6 


2 


25.1 kb 


Incomplete 


30 


32 


1,781,279-1,806,383 


integrase, region invertase, helicase 56.9 


3 


14.7 kb 


Incomplete 


40 


18 


1,800,767-1,815,474 


Portal protein, head maturation protease 57.7 


4 


39.6 kb 


Incomplete 


60 


45 


2,265,763-2,305,412 


Integrase, peptidoglycan hydrolase 58.5 



""Completedness, a prediction of whether the region contains an intact or incomplete prophage based on the applied 
criteria of PHAST; Score, the score of the region based on the applied criteria of PHAST; CDS, the number of coding 
sequences; Coordinates, the start and end positions of the region on the bacterial chromosome; GC%, the percentage 
of GC nucleotides of the region. 



Extrachromosomal replicons 

Complete genome sequencing of Phaeobacter 
gallaeciensis CIP 1052 10^ resulted in eight repli- 
cons ranging from 40 kb to 3.8 MB in size. For the 
seven extrachromosomal repliconS; ranging in size 
between 40 kb and 255 kb [Table 6), circular con- 
firmation has been experimentally validated. The 



extrachromosomal replicons were analyzed as de- 
scribed in [46] and [47]. They contain characteris- 
tic replication modules [43] of the RepABC-, DnaA- 
like, RepA- and RepB-type comprising a replicase 
and a parAB partitioning operon [48]. Plasmid 
pGal_E78 also contains a replicase that is homolo- 
gous to those of RepABC-type plasmids, but the 
partitioning genes repAB are missing. The solitary 
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replicase cannot be classified according to the es- 
tablished scheme [49] and is designated as 
RepC_soli-la (RepC [50]). The respective 
replicases of the other extrachromosomal repli- 
cons that mediate the initiation of replication are 



designated according to the established classifica- 
tion scheme [51]. The numbering of specific 
replicases corresponds to plasmid compatibility 
groups that are required for a stable coexistence 
of the rephcons within the same cell [49]. 



Table 6. General genomic features of the chromosome and extrachromosomal elements from P. gallaeciensis strain CIP 
105210^^ 



Replicon 


No. 


Replicase 


Length (bp) 


GC (%) 


Topology 


No. Genes* 


Chromosome 


1 


DnaA 


3,776,653 


60 


circular 


3,703 


pGal_A255 


2 


DnaA-like-l 


255,493 


58 


circular 


237 


pGal_B134 


3 


RepABC-5 


133,631 


60 


circular 


155 


pGaLCIIO 


4 


RepABC-8 


109,815 


56 


circular 


115 


pGal_D78 


5 


RepB-l 


77,876 


62 


circular 


62 


pGal_E78 


6 


RepC_soli-la 


77,775 


55 


circular 


81 


pGal_F69 


7 


RepA-l 


68,752 


58 


circular 


56 


pGal_G40 


8 


RepABC-4 


40,170 


56 


circular 


51 



""deduced from automatic annotation. 
The comparison of the extrachromosomal reph- 
cons from P. gallaeciensis CIP 1052 10^ and P. 
inhibens DSM 17395 documents a strong conser- 
vation and long-range synteny of three replicons. 
The largest 255 kb DnaA-like-I replicon 
(pGal_A255) is slightly smaller than the 262 kb 
equivalent (NC_018291.1), sharing 89% identity 
on nucleotide level. The RepB-1 type replicon 
pGal_D78 exactly matches the size of the DSM 
17395 replicon (NC_018287.1, 91% identity), 
whereas the RepA-I type replicon pGal_F69 is 



slightly larger than its equivalent [65 kb; 
NC_018288.1, 91% identity). On the contrary, 
RepABC-type replicons are not present in the DSM 
17395 genome. However, only two of the four ad- 
ditional plasmids, the RepABC-5 type replicon 
pGal_B134 and the RepC_soh- la-type rephcon 
pGal_E78 possess type IV secretion systems that 
are required for conjugative transfer [52]. Finally, 
the three replicons pGal_A255, pGal_B134, 
pGal_C110 are equipped with stabihzing tox- 
in/antitoxin modules [53] (Table 7). 



Table 7. Integrated Microbial Genome (IMG) locus tags of P. gallaeciensis CIP 105210^ genes for the initiation 
of replication, toxin/antitoxin modules and type IV secretion systems (T4SS) required for conjugation. 



Replicon 


Replication Initiation 


Plasmid Stability 


Type IV Secretion 


Replicase 


Locus Tag 


Toxin 


Antitoxin 


VirB4 VirD4 


Chromosome 


DnaA 


GaLOOOOl 








pGal_A255 


DnaA-like-l 


Gal_03722 


Gal_03770 


Gal_03771 




pGal_B134 


RepABC-5 


Gal_03960 


Gal_03975 


Gal_03974 


Gal_04010 Gal_03992 


pGaLCIIO 


RepABC-8 


Gal_04107 


Gal_04110 


Gal_041 1 1 




pGal_D78 


RepB-I 


Gal_04221 








pGal_E78 


RepC_soli-1a 


Gal_04283 






Gal_04360 Gal_04345 


pGal_F69 


RepA-I 


Gal_04364 








pGal_G40 


RepABC-4 


Gal_0441 7 
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59 



Height 

10 08 06 0402 COG functional classifications and colors 

— I 1 1 1 □ S Function unl<nown 

□ R General function prediction 
^^^^ ■ Q Secondary metabolite/transport/biosynthesis 

81 ■ UnromOSOme ■ p morganic ion metabolism 

U I Lipid transport/metabolism 

jQ ■ ^^^^^^^^^ I I ^^^H I I I p(ial_LJ78 1^ H Coenzyme transport/metabolism 

B F Nucleotide transport/metabolism 

■ E Aminoacid transport/metabolism 

■ G Carbohydrate transport/metabolism 

■ C Energy production/conversion 

□ O Chaperones 

B U Intracellular trafficing/secretion 

□ W Extracellular structures 
■ Z Cytoskeleton 

76 I pGal A255 ■ N Cell motility 

99 " L1_LJ L_L^ LU U 

3 M Cell wall/membrane/envelope biogenesis 

■ T Signal transduction 

□ V Defense mechanisms 

□ Y Nuclear structure 

□ D Cell division/chromosome partitioning 

■ B Chromatin structure 

■ L Replication/recombination/repair 
Qj I I I I I I □ K Transcription 

^ " O A RNA processing and modification 

□ J Translation/ribosomal structure 

Figure 4. Summary and representation of the assortment of replicons in CIP 105210^ as previously described 
in [47]. Bars show the relative frequency of functional classes according to the database of clusters of orthol- 
ogous groups of proteins (COGs). The cluster dendrogram arranges the replicons according to their overall 
codon usage. Codon-usage matrices were generated using yet unpublished scripts for the statistical analysis 
software R (version 2.15.0.) [54]. The hierarchical clustering analysis was conducted using the pvclust func- 
tion [55] with ''complete'' as agglomeration method. Pvclust also returns the AU (Approximately Unbiased) 
values as statistical support for clusters in percent. Support values >95% are given in bold. Asterisks indicate 
the presence of genes for conjugation (T4SS) as listed in Table 7. 



pGal_B134* 



pGal_C110 




The 255 kb DnaA-like-I replicon pGal_A255 is 
largely constituted by genes coding for proteins in 
COG E "amino-acid transport and metabolism" and 
COG P "inorganic ion metabolism" [Figure 4). The 
latter category comprises, for example, a Fe3+ 
siderophore complex (Gal_03846 to Gal_03848), 
which contains ferric-iron chelating agents that 
facilitate enhanced uptake of this essential com- 
pound [56]. pGal_A255 furthermore harbors six 
genes involved in chemotaxis, a tRNA [Gal_03828) 
and a cluster for the biosynthesis of coenzyme 
PQQ, a redox factor [Gal_03896). The genes for the 
synthesis of the antibiotic tropodithietic acid 
[TDA) [57] are consolidated in a cluster on 
pGal_A255 and comprise tdaA [Gal_03819), tdaB 
(Gal_03818), tdaC (Gal_03817), tdaE (Gal_03815) 
and tdaF (Gal_03802). The 134 kb RepABC-5 type 
plasmid pGal_B134 harbors in comparison to the 
other seven replicons the most chaperons [COG 0, 
Figure 4), owing to an elevated presence of cyto- 
chromes and disulfide bond formation proteins. 
pGal_B134 also holds a dimethyladenosine- 
transferase (Gal_03978) that facilitates RNA 
methylation and a T4S system [Table 7), thus 
combining on this plasmid genes for epigenetic 
modifications. The RepABC-8 type plasmid 
pGal_C110 consists mainly of amino acid and car- 



bohydrate transporters [COGs E and G) and bio- 
genesis of secondary metabolites [COG Q). COG 
transcription is also elevated, due to the presence 
of 15 transcriptional regulators. On the RepB-I re- 
plicon pGal_D78, COG K transcription is elevated, 
owing to the presence of twelve transcriptional 
regulators and a RNA-polymerase [Gal_04277). 
This replicon also contains genes for siderophore 
synthetases [Gal_04241 to Gal_04247) and a cata- 
lase/peroxidase [Gal_04279). On the RepC_soli-la 
plasmid pGal_E78, proteins of COG C energy pro- 
duction and conversion are constituted by py- 
ruvate dehydrogenase El and E2 components, 
which play a role in the citrate cycle and glucone- 
ogenesis. The RepA-I replicon pGal_F69 contains 
an RTX toxin [58] [Gal_04412) and exhibits a 
strong accumulation of COG IVI, "cell-envelope bio- 
genesis". It harbors several polysaccharide export 
proteins including a type I secretion system ABC 
transporter [Gal_04381, Gal_04382), and a com- 
plete rhamnose operon [59]. P. gallaeciensis CIP 
105210T [= DSIVI 26440T) forms strong biofilms 
[unpublished results) and the extrachromosomal 
69 kb replicon seems to be responsible for the at- 
tached lifestyle as previously proposed for the P. 
inhibens strains DSIVI 17395 and DSIVI 24588 
[2.10) [3]. pGal_G40 represents a hybrid between 
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a plasmid and a circular phage, comparable to the 
cohphage N15 [60,61]. It contains an N-acyl-L- 
homoserine lactone synthetase (Gal_04460) and a 
complete repABC operon. This interesting finding 
draws a direct connection between RepABC di- 
rected replication [49], horizontal gene transfer 
and AHL-mediated quorum sensing [62]. 

Genome sequencing of P. inhihens DSM 16374^ 
revealed the presence of the complete 
dissimilatory nitrate reduction pathway and an- 
aerobic growth on nitrite has been validated ex- 
perimentally [12]. The genes of the pathway are 
located on three different replicons, i.e. the chro- 
mosome, the DnaA-like I type plasmid pInhi_A227 
and the RepABC-8 type plasmid pInhi_B88. The 
genome of the sister species P. gallaeciensis CIP 
105210^ exhibits a conspicuous synteny for the 
chromosome and three extrachromosomal reph- 
cons (DnaA-like I (pGal_A255, plnhi_A227), RepB- 
I (pGal_D78, plnhi_C78), RepA-1 (pGal_F69, 
pInhi_D69)). However, the RepABC-8 type plasmid 
including the crucial nitrous oxide reductase (EC 
1.7.2.4) is missing in P. gallaeciensis CIP 1052 10^ 
and this strain is accordingly unable to grow an- 
aerobically. 

Phylogenomic analyses 

The phylogenetic analysis of 16S rRNA gene type- 
strain sequences places P. gallaeciensis together 
with both P. caeruleus and P. daeponensis, whereas 



P. inhibens forms a cluster with P. leonis and P. 
arcticus. Both clusters are set apart from each oth- 
er, but the 16S rRNA gene tree is unresolved and 
does not allow one to infer the evolutionary inter- 
relationships in this group. Previous results [4] 
showed that the reported P. gallaeciensis type- 
strain deposit DSM 17395 belongs to P. inhibens 
and that CIP 105210^ (= DSM 26640^) is the au- 
thentic type strain of P. gallaeciensis. Moreover, 
the genome sequenced strain ANGl has been re- 
ferred to as P. gallaeciensis based on 16S rRNA 
analyses [63], but our recent study revealed a 
well-supported association with P. caeruleus and 
P. daeponensis [4]. The relationships between the- 
se Phaeobacter strains have not been coroborated 
using genome sequences. Thus, we used the Ge- 
nome-to-Genome Distance Calculator (GGDC) [64] 
to investigate the affiliation of strain ANGl and the 
genomic similarities between P. inhibens and P. 
gallaeciensis strains from available genome se- 
quences and conducted phylogenomic analyses to 
address the relationship between P. gallaeciensis 
and P. inhibens. 

Table 8 shows the results of the calculated digital 
DNA-DNA hybridization (DDH) similarities of P. 
gallaeciensis CIP 1052 10^ and P. inhibens DSM 
16374T (T5T) to other Phaeobacter strains. For 
DDH values □70% the respective query strain 
would be considered as belonging to a different 
species than the strain used as a reference [65,66]. 



Table 8. DDH similarities with standard deviations between P. gallaeciensis CIP 105210\ P. inhibens DSM 
16374^ (T5^) and other Phaeobacter strains calculated in silico with the GGDC server version 2.0 [64]. The num- 
bers in parentheses are IMG Taxon IDs identifying the genome sequence. 



Formula 

reference species 


identities/HSP length [%] 
P. gallaeciensis 
DSM 26640^ (= CIP 105210^ = 
BS107^) 


identities/HSP length [%] 
P. inhibens 
DSM 16374^ (T5^) 


P. inhibens DSM 24588 (2.10) (2501651220) 


38.00% ±2.49 


79.50% ±2.80 


P. inhibens DSM 17395 (2510065029) 


38.40% ±2.50 


78.70% ±2.83 


P. gallaeciensis ANG 1 (2526164696) 


21.40% ±2.34 


21.10% ±2.33 


P. inhibens DSM 16374"^ (T5"^) (2516653078) 


38.20% ±2.50 


1 00% 


P. gallaeciensis DSM 26640"^ (= CIP 105210^) 
(2545555837) 


1 00% 


38.20% ±2.50 



With the exception of P. gallaeciensis ANG 1, which 
neither belongs to P. gallaeciensis nor P. inhibens 
based on DDH values, the analysis supports the 
current classification. P. inhibens with the type 
strain DSM 16374^ {TS^) includes the strains DSM 



17395 and DSM 24588 (2.10), whereas the strain 
P. gallaeciensis CIP 105210^ (= DSM 26640^ is the 
sole representative of P. gallaeciensis analyzed in 
the current study. 
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For the phylogenomic analysis, protein sequences 
from the available Phaeobacter genomes were re- 
trieved from the IMG website [P. arcticus DSM 
23566T; ID 2516653081; P caeruleus DSM 24564^ 
(13T), ID 2512047087; P. daeponensis DSM 23529^ 
(TF-218T), ID 2516493020; P inhibens DSM 
16374T [T5T), ID 2516653078) or from NCBI [P 
inhibens DSM 24588 (2.10), CP002972 - 
CP002975; P sp. ANGl, AFCFOOOOOOOO; P 
gallaeciensis CIP 105210^ [= DSM 26640^), 
AOQAOOOOOOOO; P inhibens DSM 17395, 
CP002976 - CP002979; P sp. Y4I, 
ABXFOOOOOOOO). 



These sequences were investigated using the 
DSMZ phylogenomics pipeline as previously de- 
scribed [67-70] using NCBI BLAST [71], TribeMCL 
[72], OrthoMCL [73], MUSCLE [74], RASCAL [75], 
GBLOCKS [68] and MARE [76] to generate gene- 
and ortholog-content matrices as well as concate- 
nated alignments of distinct selections of genes. 

Maximum likelihood [ML) [77] and maximum- 
parsimony [MP) [78,79] trees were inferred from 
the data matrices with RAxML [80,81] and PAUP* 
[82], respectively, as previously described 
[68,70,72,83]. 

Phaeobacter galfaeciensis (DSM 26640V CIP 105210^) 



P Phaeobacter inhibens (DSM 24588 / 2.10*) 



1 00/1 00/1 00/1 00/1 00/1 00/-/98/-/1 00 
100/94/100/83/-/-/-/-/-/- 



r— Phaeobacter inhibens (DSM 1 7395*) 



L Phaeobacter inhibens (DSM 1 6374' / T5') 



Phaeobacter arcticus (DSM 23566"') 



Phaeobacter sp. (ANG1*) 



Phaeobacter caeruleus (DSM24564' / 13') 



p Phaeobacter sp. (Y4I) 



- Phaeobacter daeponensis (DSM23529' / TF 218') 
Figure 5. Phylogenetic tree inferred from the MARE-filtered supermatrix under the maximum likelihood (ML) 
criterion [77] and rooted using the midpoint rooting approach [84]. The branches are scaled in terms of the ex- 
pected number of substitutions per site. Numbers above the branches (from left to right) are bootstrap support 
values [85] (if greater than 60%) from ML/MP MARE-filtered supermatrix; ML/MP unfiltered (full) supermatrix; 
ML/MP core-genes supermatrix; ML/MP gene-content matrix; ML/MP ortholog-content matrix. Values larger 
than 95% are show^n in bold; dots indicate branches with maximum support under all settings. Genomes 
marked with stars have been renamed according to this study and [3]. 



The results of the phylogenomic analyses are 
shown in Figure 5. The "full" and IVIARE-filtered 
supermatrix trees were topologically identical and 
the tree of the latter analysis is shown in Figure 5 
together with IVIL and IVIP bootstrap support val- 
ues from all analyses if larger than 60%. The tree 
inferred from the core-gene matrix showed a dis- 
tinct grouping within Phaeobacter inhibens, i.e. P. 
inhibens DSM 17395 as sister of the clade compris- 
ing P. inhibens DSM 16374^ [TS'^) and P. inhibens 
DSM 24588 (2.10). The topologies of both IVIP and 



IVIL "full" and IVIARE-filtered supermatrix trees 
were identical, whereas the IVIP core-genes tree 
was topologically identical to the IVIL core-genes 
tree. Both gene-content and ortholog-content IVIP 
trees were topologically identical and showed P. 
inhibens DSM 16374^ [T5t) as a sister taxon of P. 
inhibens DSM 24588 (2.10) and P inhibens DSM 
17395. Only the ML gene-content and ortholog- 
content trees deviated regarding the species 
boundaries, showing a clade comprising P. 
inhibens DSM 16374^ (T5t) and P. gallaeciensis CIP 



http://standardsingenomics.org 



927 



Phaeobacter gallaeciensis CIP 1 0521 OT 

105210T (= DSM 26640T) as well as a clade com- 
prising P. inhibens DSM 24588 (2.10) and P. 
inhibens DSM 17395. 

Thus, the analyses supported the earlier conclu- 
sion [3] that DSM 17395 belongs to P. inhibens. 
The analyses also confirmed that P. "gallaeciensis" 
ANGl belongs neither to P. gallaeciensis nor to P. 
inhibens and might therefore represent a novel, 
not yet named seventh species in the genus 
Phaeobacter. Further, the analysis confirms P. 
gallaeciensis CIP 1052 10^ [= DSM 26640^) as the 
sole representative of the species Phaeobacter 
gallaeciensis. 
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